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, Abstract. The goal of the present paper is to provide a systematic and comprehensive 

^ ' study of rational stochastic languages over a semiring K G {Q, Q^, R, R"^}. A rational 

D ' stochastic language is a probability distribution over a free monoid E* which is rational 

over K, that is which can be generated by a multiplicity automata with parameters in 
f» — ' K. We study the relations between the classes of rational stochastic languages Si('^{S). 

I We define the notion of residual of a stochastic language and we use it to investigate 

properties of several subclasses of rational stochastic languages. Lastly, we study the 
representation of rational stochastic languages by means of multiplicity automata. 
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1 Introduction 



In probabilistic grammatical inference, data often arise in the form of a finite sequence 
of words wi, . . . ,Wn over some predefined alphabet U. These words are assumed to 
^ I be independently drawn according to a fixed but unknown probability distribution over 

^ ■ U* . Probability distributions over free monoids E* are called stochastic languages. A 

Q . usual goal in grammatical inference is to try to infer an approximation of this distri- 

CN I bution in some class of probabilistic models, such as probabilistic automata. A proba- 

bilistic automaton (PA) is composed of a structure, which is a finite automaton (NFA), 
^ . and parameters associated with states and transitions, which represent the probabil- 

I ity for a state to be initial, terminal or the probability for a transition to be chosen, 

f-^ ' It can easily be shown that probabilistic automata have the same expressivity as Hid- 

den Markov Models (HMM), which are heavily used in statistical inference [DDE05]. 
^ ', Given the structure A of a probabilistic automaton and a sequence of words S, com- 

^ I puting parameters for A which maximize the likelihood of S is NP-hard [AW92]. In 

practical cases however, algorithms based on the E.M. (Expectation-Maximization) 
method [DLR77] can be used to compute approximate values. On the other hand, in- 
ferring a probabilistic automaton (structure and parameters) from a sequence of words 
is a widely open field of research. Most results obtained so far only deal with re- 
stricted subclasses of PA, such as Probabilistic Deterministic Automata (PDA), i.e. 
probabilistic automata whose structure is deterministic (DFA) or Probabilistic Resid- 
ual Automata (PRA), i.e. probabilistic automata whose structure is a residual finite 
state automaton (RFSA)[CO94,CO99,dlHT00,ELDD02,DE04]. 

In other respects, it can be noticed that stochastic languages are particular cases 
of formal power series and that probabilistic automata are also particular cases of 
multiplicity automata, notions which have been extensively studied in the field of for- 
mal language theory[SS78,BR84,Sak03]. Therefore, stochastic languages which can 
be generated by multiplicity automata are special cases of rational languages. We call 
them rational stochastic languages. The goal of the present paper is to provide a sys- 
tematic and comprehensive study of rational stochastic languages so as to bring out 



properties that could be useful for a grammatical inference purpose. Indeed, consid- 
ering the objects to infer as special cases of rational languages makes it possible to 
use the powerful theoretical tools that have been developed in that field and hence, 
give answers to many questions that naturally arise when working with them: is it 
possible to decide within polynomial time whether two probabilistic automata gener- 
ate the same stochastic language? does allowing negative coefficients in probabilistic 
automata extend the class of generated stochastic languages? can a rational stochastic 
language which takes all its values in Q always be generated by a multiplicity automata 
with coefficients in Q? and so forth. Also, studying rational stochastic languages for 
themselves, considered as objects of language theory, helps to bring out notions and 
properties which are important in a grammatical inference pespective: for example, we 
show that the notion of residual language (or derivative), so important for grammatical 
inference [DLT02,DLT04], has a natural counterpart for stochastic languages [DE03], 
which can be used to express many properties of classes of stochastic languages. 

Formal power series take their values in a semiring K: let us denote by K{{U) ) the 
set of all formal power series. Here, we only consider semirings Q, M, and R+. For 
any such semiring K, we define the set S'^^{U) of rational stochastic languages as the 
set of stochastic languages over U which are rational languages over K. For any two 
distinct semirings K and K', the corresponding sets of rational stochastic languages 
ai^e distinct. We show that M is a Fatou extension of Q for stochastic languages, which 
means that any rational stochastic language over M which takes its values in Q is also 
rational over Q. However, is not a Fatou extension of Q+ for stochastic languages: 
there exists a rational stochastic language over M"*" which takes its values in Q"*" and 
which is not rational over Q"*". 

For any stochastic language p over E and any word u such that p{uU*) ^ 0, let us 
define the residual language u~^pofj3 with respect to uby u~^p{w) = p{uw) /p{uU*): 
residual languages cleaiiy are stochastic languages. We show that the residual lan- 
guages of a rational stochastic language p over K are also rational over K. The residual 
subsemimodule [Res{p)] of K{{U)) spanned by the residual languages of any stochas- 
tic language p may be used to express the rationality of p: p is rational iff [Res{p)] is 
included in a finitely generated subsemimodule of K{{U)). But when K is positive, 
i.e. K = Q"*^ or K = M"*^, it may happen that [Res{p)] itself is not finitely generated. 
We study the properties of two subclasses of S^^^{E): the set sj^"'^'^"'{U) composed 
of rational stochastic languages over K whose residual subsemimodule is finitely gen- 
erated and the set composed of rational stochastic languages over K which 
have finitely many residual languages. We show that for any of these two classes, 
is a Fatou extension of Q"*": any stochastic language of S^^^'^^{U) (resp. of S^J^{U)) 
which takes its values in is an element of cS^V"^''"(i:) (resp. ofSl^+{X!)). We also 

show that for any element p of there exists a unique minimal subset of 

residual languages ofp which generates [Res{p)]. 

Then, we study the representation of rational stochastic languages by means of 
multiplicity automata. We first show that the set of multiplicity automata with parame- 
ters in Q which generate stochastic languages is not recursive. Moreover, it contains no 
recursively enumerable subset capable to generate the whole set of rational stochastic 
languages over Q. A stochastic language p is a formal series which has two properties: 
(i) p{w) > for any word w, (ii) J^wPi''^) ~ ^- show that the undecidability 
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comes from the first requirement, since the second one can be decided within poly- 
nomial time. We show that the set of stochastic languages which can be generated 
by probabilistic automata with parameters in (resp.M+) exactly coincides with 
5Q+*(i7) (resp. S^{I])). A probabilistic automaton A is called a Probabilistic Resid- 
ual Automaton (PRA) if the stochastic languages associated with its states are residual 
languages of the stochastic languages pA generated by A. We show that the set of 
stochastic languages that can be generated by probabilistic residual automata with pa- 
rameters in (resp.M+) exactly coincides with S^^^^"'{U) (resp. We 
do not know whether the class of PRA is decidable. However, we describe two decid- 
able subclasses of PRA capable of generating 5^*"^^" (17) when K = or K = M"*" : 
the class of K-reduced PRA and the class of prefixial PRA. The first one provides min- 
imal representation in the class of PRA but we show that the membership problem is 
PSPACE-complete. The second one produces more cumbersome representation but 
the membership problem is polynomial. Finally, we show that the set of stochastic 
languages that can be generated by probabilistic deterministic automata with parame- 
ters in Q"*" (resp.]R+) exactly coincides with S^^{U), which is also equal to 

(resp. which is also equal to S^^"'{U)). 

We recall some properties on rational series, stochastic languages and multiplicity 
automata in Section 2. We define and study rational stochastic languages in Section 3. 
The relations between the classes of rational stochastic languages are studied in Sub- 
section 3.1. Properties of the residual languages of rational stochastic languages are 
studied in Subsection 3.2. A characterisation of rational stochastic languages in terms 
of stable subsemimodule is given in Subsection 3.3. Classes cS^^"^""(i;) and cS/^*"(r) 
are defined and studied in Subsection 3.4. The representation of rational stochastic 
languages by means of multiplicity automata is given in Section 4. 



2 Preliminaries 
2.1 Rational series 

In this section, we recall some definitions and results on rational series. For more 
information, we invite the reader to consult [SS78,BR84,Sak03]. 

Let be a finite alphabet, and U* be the set of words on U. The empty word is 
denoted by e and the length of a word u is denoted by |u|. The number of occurrences 
of the letter x in the word w is denoted by \w\x. For any integer k, we denote by the 
set {u e U* \ \u\ = k} and by E-'' the set {u £ L* \ \u\ < k}. We denote by < the 
length-lexicographic order on U*. For any word u € S* and any language L C E*, 
let uL = {uv £ U*\v £ L} and u~^L = {v £ E*\uv £ L}. A subset P of E* is 
prefixial if for any u,v £ E*, uv £ P ^ u £ P. 

A semiring is a set K with two binary operations + and • and two constant elements 
and 1 such that 

1. (-fC, +, 0) is a commutative monoid, 

2. {K, - ,1) is a monoid, 

3. the distribution laws a-{b + c)=a-b + a- c and {a + b)-c = a- c + b- c hold, 

4. • a = a • = for every a. 
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A semiring is positive if the sum of two elements different from is different from 

0. 

The semirings we consider here are the field of rational numbers Q, the field of 
real numbers M, and M"^, respectively the non negative elements of Q and M; Q"^ 
and M"^ are positive semirings. 

Let Z" be a finite alphabet and K a semiring. A formal power series is a mapping r 
of U* into K. The values r{w) where w G Z"* are referred to as the coefficients of the 
series, and r is written as a formal sum r = J2weE* f{w)w. The set of all formal power 
series is denoted by K{{S)). Given r, the subset of U* defined by {w\r{w) / 0} is 
the support of r and denoted by supp{r). A polynomial is a series whose support is 
finite. The subset of K{{U)) consisting of all polynomials is denoted by K{U). 

We denote by the series all of whose coefficients equal 0. We denote by 1 the 
series whose coefficient for e equals 1, the remaining coefficients being equal to 0. 
The sum of two series r and r' in K{{S)) is defined by r + r' = J2w€i:*(''"('^) + 
r'{w))w. The multiplication of a series r by a scalar a G K is defined by ar = 
Yliw&E* ^ ' f{'w)w. The Cauchy product of two series r and r' is defined by rr' = 
Hw&E* {HwiW2=w ^(^'i) • 'r'{w2)) w. These operations furnish K{{U)) with the struc- 
ture of a semiring with K{S) as a subsemiring. The Hadamard product of two series 
r and r' is defined by r r' = Ylwes* T{w)r' {w)w. 

A series r is quasiregular if r(e) = 0. Quasiregular series have the property that 
for every w ^ S*, there exist finitely many integers i such that r''{w) / where the 
exponent i of r* refers to the Cauchy product. Let r be a quasiregular series, r* (resp. 
r"*") is defined by r*{w) = Yli>o f^i'^) (resp. r^{w) = Yli>i f'^iw)). 

A subsemiring R of K{{S)) is rationally closed if £ R for every quasiregu- 
lar element r of R. The family K^°-*{{U)) of K-rational series over U is the smallest 
rationally closed subset of K{{U)) which contains all polynomials. When K is com- 
mutative, the Hadamard product of two rational series is a rational series. 

Let K be a semiring and let m, n be two integers. Let us denote by the set 

of m X n matrices whose elements belong to K and by the matrix whose diagonal 
elements ai^e equal to 1 and whose all other elements are null. Note that i^™^™ forms 
a semiring. 

A series r is recognizable if there exists a multiplicative homomorphism : 17* — > 
K"'^"-,n > 1, and two matrices A G ^^^",7 G i^"xi such that for every w G U*, 
r{w) = \^{w)^. The tuple (A, ^,7) is called an n dimensional linear representation 
of r. A linear representation of r is said to be reduced if its dimension is minimal. 

Let us denote by K'''^^{{E)) the set of all recognizable series. 

Theorem 1. [Sch61 ] The families and coincide. 

Let be a semiring. Then a commutative monoid V is called a K-semimodule if 
there is an operation • from K x V into V such that for any a,b K,v,w V, 

1. (ab) ■ V = a ■ {h ■ v), 

2. {a + b)-v = a- v + b- v and a ■ {v + w) = a ■ v + a ■ w, 

3. 1 ■ V = V and ■ v = 0. 

If 5 is a subset of a K-semimodule V, the subsemimodule [S] generated by S is 
the smallest of all subsemimodules of V containing S. It can be proved that [5] = 
{aisi + . . . + anSn\n eW,ai e K,Si e S}. 
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Let us consider the semimodule K of all functions F : E* ^ K. For any word 
u of S* and any function F of , we define a new function uF by uF{v) = F{uv) 
for any word v. The operator transforming F into iiF is linear: for any F,G^ 
and a £ K, u {a ■ F) = a ■ iiF and u{F + G) = iiF + iiG. A subset B of K^* is 
called stable if the conditions u G Z"* and F £ B imply that uF G 5. 

Theorem 2. [Fli74,Jac75] Suppose that K is a commutative semiring and r belongs 
to K{{U)). Then the following three conditions are equivalent: 

1. r belongs to K''''\{U)); 

2. the subsemimodule ofK{{E)) generated by {ur\u £ E*} is contained in a finitely 
generated stable subsemimodule of K^* ; 

3. r belongs to a finitely generated stable subsemimodule of . 

When K is not a field, it may happen that a series r belongs to a finitely generated 
stable subsemimodule of K{{U)), and hence is a rational series, while the stable sub- 
semimodule generated by {ur\u G E*} is not finitely generated. An example of this 
situation will be provided on Example 1 . 

Two linear representations (A, /U, 7) and (A', /i', 7') of a rational series r are similar 
if there exists an inversible matrix m G K"^" such that A' = Am, ji'w = m"^fiw'm 
for any word w and 7' = m~^j. 

Theorem 3. [Sch61,Fli74] Assume that K is a commutative field. Then any two re- 
duced linear representations (A, /.i, 7) and (A', /x', 7') of a rational series r are similar. 
The dimension of any reduced linear representation of r is also the dimension of the 
vector subspace generated by {ur\u G Z"*}. 

Let K be a subsemiring of K' . K' is said to be a Fatou extension of K if ev- 
ery rational series over K' with coefficients in K is a rational series over K. It has 
been shown in [Fli74] that when K and K' are commutative fields then K' is a 
Fatou extension of K. Therefore, M is a Fatou extension of Q: any rational series 
over M which only takes rational values is a rational series over Q: W°-^{{E)) n 
Q((i7)) = Q''''*((Z')). It has also been proved that M+ is not a Fatou extension of 

Q+: c M+™*((i:)) n Q+((i:)). 



2.2 Stochastic languages 

A stochastic language is a formal series p which takes its values in and such that 
"^IweE* Pi'^) — 1- Foi" ^i^y stochastic language p and any language L C E* , the sum 
S«)GL^'(^) defined without ambiguity. So, let us denote '^Zux^hPi''^) Pi^)- 
set of all stochastic languages over E is denoted hy S{E). For any stochastic language 
p and any word u such that p{uE*) / 0, we define the stochastic language u~^p by 

-1 / N P{UW) 

u p{w) — 



p{uE* 



u ^p is called the residual language of p wrt u. Let us denote by res{p) the set {u G 
E* \ "^^iw&E* Pi'^'^) 7^ 0} and by Res{p) the set {u~'^p\u G res{p)}. For any K G 
{]R,M+,Q,Q+}, define S'j^\E) = A'™*((Z')) n S{E), the set of rational stochastic 
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languages over K. Let S = {si, . . . , s„} be a finite subset of S{U). The convex hull 
of S in K{{U)) is defined by convK{S) = {s G K{{E))\s = ai • si + . . . + a„, • s„, 
where each Oj G -fC, Oj > and ai + . . .+an = 1}- Clearly, any element of convxiS) 
is a stochastic language. 

Example 1. Let Z" = {a}, and let p2 and p be the rational stochastic languages 
over M+ defined on U* by 

pi(a") = 2-("+i),p2(a") = 3 • 2-(2n+2) ^^^^ ^ = (^^ + p2)/2. 
Check that 

or Pi = — , a"p2 = and a"p = — —, — 

and 

/ n\-l / n\~l J ^ n\ — 1 ^ Pi + P2 

(a") Vi=pi,(a") V2=P2and(a") P= 2" + 1 " 

Let V be the vector subspace of ]R((i7)) generated by pi and p2: V is represented 
on Figure 1. The subsemimodule of ]R+((Z')) generated by pi and p2 con^esponds 



V 








P3 


S{S) n V 




V pt 




















vvf 


dp / 











o 

Fig. 1. The stable subsemimodule of ]R+((Z')) generated by p is equal to Vp: it does 
not contains the halfline \Op\) and it is not finitely generated. 

to the closed halfcone C delimited by the halflines \0p\) and [Op2)- The line (P1P2) 
is composed of the rational series r in V which satisfy r(vS) = 1. Let q = 

api + (1 — a)p2- The constraint g(a") > is equivalent to the inequality 

(2"+^ - 3)a + 3 > 0. 

The series q such that ^(a") > for any integer n must satisfy 

< a < 3. 



6 



Let p3 = 3pi — 2p2. The stochastic languages in V are the points of the line {P2P3) 
which lie between p2 and p^. 

Let Vp be the subsemimodule of M+((Z')) generated by {up\u ^ S*}. Check that 
Vp = {t{api + {l-a)p2)\l/2 <a<l,te M+} and that Vp is not finitely generated. 

2.3 Automata 

A non deterministic finite automaton (NFA) is a tuple (i7, Q, Q/, Q^, S) where Q is 
a finite set of states, Q/ C Q is the set of initial states, Qt C Q is the set of final 
states, 6 is the transition fi^nction defined from Q x S to 2^. Let 6 also denote the 
extended transition function defined from 2*^ x U* to 2^ by 5{q, e) = {q}, 5{q, wx) = 
^qi(zs[q^w)^{q' ,x) and 5{R,w) = \Jq^ji6{q,w) for any qeQ, RQQ, xeU and 
w € S*. An NFA is deterministic (DFA) if Qj contains only one element qq and if 
Vg G Q, Vx G r, \S{q,x)\ < 1. 

Let be a semiring. A K -multiplicity automaton (MA) is a 5-tuple {U, Q, ip, l, t) 
where Q is a finite set of states, Lp: QxSxQ^Kis the transition function, t : Q ^ 
K is the initialization function and r : Q — > A' is the termination function. Let Qi = 
{q G Q\i{q) / 0} be the set of initial states and Qt = {q ^ Q\'t{<i) / 0} be the set 
of terminal states. The support of an MA (17, Q, p, l, t) is the NFA (Z", Q, Qj, Qt, 6) 
where 5{q,x) = {q' G Q\ip{q,x,q') / 0}. We extend the transition function (p to 
Q X 17* X Q by ip{q, wx, r) = Xlseg 9^(9; ^) "r) and 99(9, e, r) = 1 if g = r 

and otherwise, for any q,r £ Q, x & S and w £ S*. For any finite subset L C S* 
and any ii C Q, define ip{q, L, R) = E«,eL,rGfl "Pi^^ ^' '^)- 

For any MA A = {E, Q, cp, l, t) , we define the series by 



For any q e Q,we define the series rA,q by rA,q{w) = YlreQ '^^ r)T{r). 

If the semiring K is positive, it can be shown that the support of the series ta 

defined by a A'-multiplicity automaton is equal to the language defined by the support 

of A. In particular, supp{rA) is a regular language. This property is false in general 

when K is not positive. 

Two MA A and A' are equivalent if they define the same series, i.e. if ta = ta'- 
Let A = {U, Q, ip, i, t) be a A-MA and let q ^ Q. Suppose that there exist 

coefficients a^/ G AT for q' ^ Q' = Q \ {q} such that rA,q = J2q'eQ' '^q'''~A,q'- Let 

A' = {S,Q',ip',i\T') where 

- ^'{r, X, s) = ip{r, X, s) + as^{r, x, q) for any r,s £ Q' and x £ S, 

- i'{r) = i{r) + arL{q) for any r G Q' , 

- T'{r) = r(r) for any r G Q' . 

The multiplicity automaton A' is called a K-reduction of A. A multiplicity au- 
tomaton A is called K-reduced if it has no A'-reduction. 

Proposition 1. Let A = {E, Q, ip, l, t) be a K-MA and let A' = (17, Q' , tp' , 1! , t') he 
a K-reduction of A. Then, for any state q' G Q', VA'^q' = rA,q'- ci consequence, 




q,reQ 



TA' = TA- 
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Proof. Let Q' = Q \ {q} and let Uqi G K for any q' G Q' such that r^^q = 
J2q'eQ' '^q''''A,q'- For any state r e Q', we have 

r^',r(e) = r'(r) = T(r) = rA,r(e)- 

Now, assume that for any word w of length < k and any state r Q' we have 
rA',r{w) = rA,r{w). Let x be a letter, we have: 

rA',r{xw) = ^ ip'{r,x,s)rA',s{w) = ^ {ip{r,x,s) + Os^fir, x, q)) rA,s{w) 
seQ' s&Q' 

= ^^"^^ ^' + v{r, X, q) ^ a^rA.sM 

s£Q' seQ' 

= s)rA,s(w^) + 2;, q)rA,g{w) 

s&Q' 

= X^ 99(r, X, s)rA,s(w) = rA,r(a:^'U')- 

Hence, rA',r = fA,r for any r of Q'- Moreover, 

rA' = X] '''(•5)''A,s = X] (4s) + asi(5)) '^As 

S6Q' seQ' 

□ 

A state g G Q is accessible (resp. co-accessible) if there exists go G (resp. 
9t G Qt) and ti G 17* such that ip{qo,u,q) 7^ (resp. ip{q,u,qt) / 0). An MA is 
trimmed if all its states are accessible and co-accessible. Given an MA A, a trimmed 
MA equivalent to A can efficiently be computed from A. 

From now, we only consider trimmed MA. 

We shall consider several subclasses of multiplicity automata, defined as follows: 
A semi Probabilistic Automaton (semi-PA) is an MA {U, Q, (p, l, t) such that i, ip 

and T take their values in [0, 1], such that X^^gg i{q) < 1 and for any state q, T{q) + 

(/?(g, Q) < 1. Semi-PA generate rational series over M+. 

A Probabilistic Automaton (PA) is a trimmed semi-PA {U,Q,ip, l,t) such that 

Y2qeQ ''(1) ~ ^ ^^'^ ^'^^ ^^^^^ 9' ^('?) ~^ ^(1^ Q) ~ ^- Probabilistic automata 
generate stochastic languages. 

Proposition 2. Let A = {U, Q, (p, l, t) be a K-semi-PA (resp. a K-PA). For q G Q, 

Y,weE* rA,q{w) < 1 (resp. E«,gi7* rA,q{w) = 1). As a consequence, Y^waE* rA{w) < 
I (resp. Y.w(iE* rA{w) = I). 
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Proof. For any integer k and any q £ Q,we have 

\w\<k+l 

\w\<k reQ reQ 

|ui|<fc reQ 

From this relation, it is easy to infer by induction on k that 

Y rA,giw) + Y ¥^(9, S''+\r) < 1 (resp. = 1) 

\w\<k reQ 

when ^ is a semi-PA (resp. a PA). 
A first consequence is that 

Y ^A,q{w) < land Y ^a{w) = ^ ^ i(g)rA,q(u7) < 1. 
weE* wes* wes* geQ 

Let n = \Q\. Since A is trimmed, there exists a word u € Z"-""^ such that rA,q{u) > 
0. Therefore, there exists q < 1 such that ip{q, 17", Q) < a. It can easily be shown, by 
induction on the integer k, that ip{q, Z''^", Q) < a'^. 
Now, when A is a PA, we have 

E ^A,qH > E ^^.'^(^) = 1 - > 1 - 

uigX'* \w\<kn 

for any integer fe. Therefore, 

Y ^A,qiw) = 1. 

wes* 

Finally, 

wei:* wen* geQ qeQ 

□ 

It can easily be deduced from Proposition 2 that a M+ -reduction of a PA is still a PA 
(the property is false in general for a semi-PA). 

A Probabilistic Residual Automaton (PRA) is a PA (Z, Q, if, t, r) such that for any 
q £ Q, there exists a word u such that rA,q = u~^r^. Check that a M"*" -reduction of a 
PRA is still a PRA, since the series associated with the states remain unchanged within 
a reduction. 

A Probabilistic Deterministic Automaton (PDA) is a PA whose support is deter- 
ministic. Check that a PDA is a PRA. Therefore, a ]R+-reduction of a PDA is a PRA, 
but since reduction introduces non-determinism, it is no longer a PDA. 

For any class C of i^-multiplicity automata, let us denote by S^{I!) the class of 
all stochastic languages which are recognized by an element of C. 
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6,0.5 




a, 0.5 



B 



6,0.2 




C 



6,0.2 




6,0.5 



D 



6,0.2 




a, 0.2; 6, 0.3 



Fig. 2. Let us precise notations on automaton A: qq is the unique initial state and 
z.((/o) = 1, qi is the unique terminal state and r(gi) = 1, ip{qQ,a,qi) = 0.5, 
ip{qo,b,qo) = 0.5 and any other transitions satisfy ip{q,x,q') = 0. A is a PDA; B 



is aPRA since rs 
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rs and rs 



'^rs', C is also a PRA since rc 



and rc gi = a ^rc; it can easily be shown that D is not a PRA. 



,90 



ab 



rc 



2.4 Equivalent representations of rational series 

Stable finitely generated subsemimodules, linear representations and multiplicity au- 
tomata provide us with several representations of rational series. The following clas- 
sical claims show that they are equivalent: in particular, a series r over K is rational 
iff there exists a K-multiplicity automaton A such that r = r^. Moreover, any one of 
these representations can efficiently be derived from any other one. 

Claim 1 Let M be a stable subsemimodule of K{{U)) generated by ri, . . . , r„ and con- 
taining the series r. Let ai and afj be coefficients in K defined for any letter x 
and any I < i, j < n such that 

n n 

r = a^rj and xrj = 
i=i j=i 

Let (A, II, 7) be the linear representation defined by A[l, i] = Ui, /j,{x){i,j] = afj 
and 7[i, 1] = rj(e) for any I < i,j < n and any x e U. Then (A, 7) is a hnear 
representation of r. 

Claim 2 Let (A, fi, 7) be an n-dimensional linear representation of r and let A = {U, Q,ip,i, r) 
be the MA defined by Q = {I, . . . , n}, = A[l, i], T{i) = '^[i, 1] and ip{i, x,j) = 
fi{x)[i,j]. Then r = r^^. 

Claim 3 Let A = {U, Q, tp, l, t) be an MA and let M be the subsemimodule generated by 
{fA,q\q G Q}- Then M is a stable subsemimodule of K{{U)) which contains r^^. 

The proofs of these claims are classical. We give them for sake of completness. 
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Proof ( Claim 1 ). Let us prove by induction on the length of the word w that for any 
wordu;, iJ,{w)'j = {ri{w), . . . , (if ))*. From definition, /x(e)7 = 7 = (ri(e), . . . , r„(e 
Suppose that the relation is proved for all words of length < n and let w G and 



li{xw)"f = iJ,{x) fj,{w)^ 

= fi{x){ri{w), . . . , rn{w)Y by induction hypothesis 




{xri{w), . . . ,xrn{w)Y 
{ri{xw), . . .,rn{xw)Y . 



Now, for any word w, 

n 

Xfj.{w)-f = X{ri{w), ... , rn{w)y = ^ airi{w) = r{w). 

1=1 

Proof (Claim!). For any word w, we have 

n n 

rA{w) = ^ L{i)ip{i,w,j)T{j) = X = X[l,i]lJ,{w)[i,j]-f[i,l] = A^(i«)7- 

i,j = l i,j = l 



□ 



□ 



Proof (ClaimS). First note that = J2qeQ '-{Q)^A,q and therefore, va S M. 
Next, for any letter x, any word w and any state q & Q, 

xrA,g{w) = rA,q{xw) = ^ ip{q,x,q')rA,q'{w) 

g'eQ 

and therefore, 

xrA,q = X ip{q,x,q')rA,q'. 
q'eQ 

M is a stable subsemimodule of K{{I!)). □ 

These equivalent characterizations make it possible to transfer definitions from one 
representation mode to another: check that an n-dimensional linear representation of 
a rational series over K is reduced if and only iff the corresponding multiplicity au- 
tomaton is i^-reduced.Also, results obtained using one representation can immediatly 
be transfered to the other ones. 
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2.5 Computing equivalence and reduction of MA 

Deciding whether two NFA are equivalent is a PSPACE-complete problem. However, 
deciding whether two MA are equivalent can be achieved within polynomial time. 

Proposition 3. It is decidable within polynomial time whether two MAs over M are 
equivalent. 

Proof. Let A and A' be two MA and let (A, /i, 7) (resp. (A', 7')) be an n-dimensional 
(resp. n'-dimensional) linear representation of the rational series rA (resp. r^/). For 
any word w let 6{w) = {n{w)'-f,iJ.'{w)'y'). Let E be the vector subspace of M"'+"' 
spanned by {6{w)\w G U*} and let T be the linear mapping fromM"+" to M defined 
by T{u, u') = Xu — X'u' for any n G E" and u' G M" . The series va and va' are 
equal, i.e. A and A' are equivalent, iff V(u, u') G E, T{u, u') = 0, property which can 
be checked within polynomial time. □ 

The following algorithm decides the equivalence of two MA: 

Input: A, A' MA 

B = {e],S = {x\x G S] 

while 5*7^0 do 

let V be the smallest element in S and let S = S\{v} 
if 9{v) does not belong to the subspace spanned by 9{B) 

then 

B = BU{v} and S = S\J{vx\xeS} 

end if 
end while 
while S / do 

let v^B and let B = B\{v} 

if T{e{v)) / then 

output no ; exit 

end if 
end while 
output yes. 

The first part of the algorithm computes a basis of E; the second part checks 
whether r(^) = {0}. 

Note that when A and A' are not equivalent, the previous algorithm provides a 
word u such that rA{u) / r^'(n) and whose length is < |(5| + 

Proposition 4. Let tIq , ^1 , • • • , MAs over M. It is decidable within polynomial 
time whether there exists qi, . . . , a„ G M such that taq = X^ILi ^i'^^Ai- More pre- 
cisely, all such tuples of parameters (ai, . . . , q;„) are solutions of a linear system 
computable within polynomial time. 

Proof. Consider the following algorithm. 

Let Eq = {rAo(e) = ELi XirA^^)} 

iEq is a set of independent equations on variables xi,...,x, 
While Eq has a solution (qi,...,q;„) such that ?'Ao / Sf=i Q^i^A, 
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Let li be a word such that rj^^^u) ^ J27=i^i'''Aiiu) 
Eq = EqU {tao (u) = EUl ^i^A, (u)} 
Output : Eq 

From Proposition 3, if taq / J27=i ^^i^Ai, a word u such that r^,, (n) ^ Y17=i o^i^Ai 
and whose length is < Y17=o I Q« I found within polynomial time (where \Qi\ is 

the number of states of Ai). The algorithms ends since Eq has at most 77, + 1 elements. 
It is clear that (ai, . . . , an) is a solution of Eq iff = Z]"=i ctir^i-. □ 

A similar result holds when we ask for positive coefficients. 

Proposition 5. Let Aq, Ai, . . . , An be MAs over M. It is decidable within polynomial 
time whether there exists ai, . . . , q;,„ G M"*" such that vaq = X^^Li ^i'^^Ai- 

Proof. Add the constraints xi > 0, . . . ,Xn > to the system Eq in the previous 
algorithm. A polynomial linear programming algorithm will then find a solution of Eq 
or decide that Eq has no solution. □ 

As a consequence of these propositions, it can efficiently be decided whether an 
MA A is ivT-reduced . 

Proposition 6. Let A = {E, Q, (p, l, t) be a K-MA. It is decidable within polynomial 
time whether A is K -reduced; if A is not K -reduced, a K reduction can be computed 
within polynomial time. 

Proof. For any q £ Q, check whether there exist coefficients Ogi G K for q' Q' = 
Q \ {Q} such that rA.q = J2q'eQ' '^q''^A,q'- If so, use these coefficients to compute a 
K-reduction of A. □ 

3 Rational stochastic languages 

The objects we study are rational stochastic languages, i.e. stochastic languages which 
are also rational series. A rational stochastic language can always be generated by us- 
ing a multiplicity automaton. But depending on the set K of numbers used for the 
parameters, we obtain different sets of rational stochastic languages. In the 

following, we suppose that K G {M, Q, Q"*"}. First, we study the relations be- 
tween all these classes of rational stochastic languages and next, we give a characteri- 
zation of in terms of stable subsemimodules of S{S). 

3.1 Relations between classes of rational stochastic languages 

Let us begin by the simplest inclusions. 
Proposition 7. 

Moreover, 
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Proof. Let Ki be a subsemiring of K2. We have C and hence, 

Now, let r be the rational series defined on = {a} by r(e) = -v/2/2, r(a) = 
1 - V2/2 and r(a") = for any n > 2. Clearly, r G \Q((Z')) which implies 

that Sl^\U) C 5^(17) and C □ 

A rational stochastic language over M which only takes rational values is a rational 
stochastic language over Q. 

Proposition 8. 

Proof. 

Recall that M is a Fatou extension of Q: any rational series over M which only takes 
rational values is a rational series over Q i.e. 

n = Q™*((i:)). 

As a consequence, 

5^(17) n m^)) = S{iJ) n M™*((i7)) n Q((i:)) 
= s{E)r^q'''\{ij)) 

□ 

It has also been proved that is not a Fatou extension of Q+: Q+'^"*((Z')) C 
j^+rat ^ ^2;) ) n ( (^) ) ■ We prove below that this result can be extended to stochastic 
languages: there exists a rational stochastic language over ]R+ which takes only rational 
values and which is not a rational stochastique language over Q"*". 

Proposition 9. C S^^{E) n Q+((i:)). 

Proof We use an element in n Q+((i:)) \ Q+™*((Z')) described in 

[BR84] to prove the proposition. 

Consider the multiplicity automaton A = {S, Q, if, l, t) where E = {a, b}, Q = 
{qo,qi}, i^iqo) = ^(gi) = l, ^{qo,a,qo) = o?, ip{qo,b,qo) = 0"^, ip{qi,a,qi) = 
a~^, ip{qi,b, qi) = o? where a = (\/5 + l)/2, ip{qi,x, qj) = for any x e U when 
i / j and r(go) = ''"(^i) = 1 (see Figure 3). 

Let rA be the rational series generated by A. Let w ^ E*. We have rA{w) = 
a^" + a~'^"' where n = \w\a — \w\b- Check that for any integer n, a^" + a~^'" G N. 
Hence, ta G n Q+{{U)). It is shown in [BR84] that rA Q+™*((Z')). 

Now let A' = {I!,Q,ip' , l' ,t') where for any states q and q' and any letter x, 
i'{q) = 1/2, (p'{q,x,q') = (p{q,x,q')/A and r'iqo) = T'{qi) = 1/4. Check that 

+ = 3. Then, A' is a probabilistic automaton. Let p be the stochastic language 
generated by A. We have 

p(.w) = (a^" + a"^'') where n = \w\a - \w\b 
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and hence 

Let s be the series defined by s{w) = 22|'"l+3. Clearly, s G Q+™*((Z')) and 
rA = sQp (Hadamard product). Recall that when K is commutative, the Hadamard 
product of two rational series is a rational series. Therefore Q+ {{U)) ^ p 

Q+''''\{U)) and hence, p 5™+* (17). □ 




Fig. 3. A' generates a rational stochastic language p^/ which takes all its values in 
Q. However, pA' is not a rational stochastic language over Q"*". A" is a multiplicity 
automaton over Q which generates pA' ■ 



Remark that since p is a rational stochastic language which takes all its values in Q, 
p is a rational stochastic language over Q, from Prop 8. Let pQ = PA',qo ^rid pi = PA',qi 
be the stochastic languages generated from the states qq and gi of automaton A'. It can 
easily be shown that 

f P = iPo + ^Pi 
\a~^p= ^po + i^pi 

These relations makes it possible to base on p and a^^p an automata which recognizes 
p. Check that 

3 • 3 3 —1 3 • 13 

ap = -p, bp= -p- -a~^p, aa~^p = —p - -;a~^p and ba~^p = -p + V- 
8 4 8 6 4 6 4 

These relations can be used to prove that the automaton A" in Fig. 3 generates p. 

Now, we prove that there exists a rational stochastic language over Q which is not 
rational over M"*". In particular, it cannot be generated by a probabilistic automaton. 

Proposition 10. SL'^^U) \ S^^{U) / 0. 
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Proof. Let U = {a, 6} and for any w G U* , let r and s be the series defined by 
r{w) = \w\a and = \w\h. They are rational over Q since they belong to a stable 
finitely generated subsemimodule of Q((Z')). Indeed, 

dr = r + 1, 6r = r, as = s and 6s = s + 1. 

Hence, the series r — s and (r — s)^ where the exponent refers to the Hadamard product 
are also rational over Q. For any n G N, let cr„ = X^.^jg^'" ('^ ~ s)'^{w) < ■ 2". Check 
that 

^n = n2"anda = 5]^ = 2. 

n>0 

Now, let t be the series defined by 

r — s)^(7i;) 



tiw) 



a 



22|' 



t is a rational stochastic languages over Q. Its support is the set supp{t) = {w E 
E* I \w\a 7^ \w\b\ which is known to be not rational. If t were rational over M"*", it 
support would be rational. Therefore, t G Sl^^S) \ □ 

All these results can be summarized on diagram 4. 



E+ 




































5,5"'{^) = sr*(s)nQ+(i:) 
















s;+ (S) n «}+«£» 































Fig. 4. Inclusion relations between classes of rational stochastic languages. 



3.2 Residual languages of rational stochastic languages 

Recall that given a stochastic language p G S{U) and a word u G res(p), i.e. such 
that p{uS*) / 0, the residual language of p wrt u is the stochastic language defined 
by 

-1 / N Pjuw) 
^ ^(^) = R^- 

When p takes its values in Q+, it is not true in general that u~^p takes also its values 
in 0+. 
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Consider two series {an)neN and {Pn)neN over Q+ and such that a„ = \/2/2 
and /3n = 4/^ — \/2/2. Now, consider the series r G Q+(({a,6})) defined by 
r{e) = 1/5, r(a'^) = a^-i, r(6") = for n > 1 and r(tt;) = otherwise. It is 
easy to check that r is a stochastic language which takes its values over and that 
a-ir(e) = ^/2ao- Therefore, a^V Q((i7)). 

We prove below that when p is a rational stochastic language over K, all its residual 
languages are also rational over K. Moreover, the set Res{p) = {u~^p\u G res{p)} 
generates the same subsemimodule of K{{U)) as the set {up\u G U*}. 

We need before two linear algebra technical lemmas to prove this result. 

Lemma 1. Let f : Q" Q" be a linear mapping and let t G Q" such that Yl,k>o f^^ 
converges to u. Then u G Q". 



Proof. Let F be the vector subspace of generated by {f^t\k G N}. There exists 
an integer d such that f^t = t, . . . , f'^~^t is a basis of F. As the sum X^fc>Q f^t 
converges, /'^t converges to when k tends to infinity. Therefore, for any v G F, 
/^v also converges to when k tends to infinity. Let v G F such that /v = v. We 
have also /'^v = v for any integer k and hence, v = 0. Let g : F ^ F defined by 
g{'v) = V — / v. The hnear mapping g is one-to-one and for any v G F and any integer 
k, 

v + /v + ... + /*^v = 5-^(1 -/*^+i)(v). 

Therefore, 

u = g-'^t and u G Q". 



We use Lemma 1 to show that if {ri, . . . , r.„} generates a stable subsemimodule 
of Q{{U)) and if each sum J2wgi: ^ii'^) converges to a-i then each ai G Q. 



Lemma 2. Let M be a stable subsemimodule of Q_{{E)) generated by {ri, . . . , r„} 
and let = J^weU'' fi{w)for any 1 < i < n and any integer k. Suppose that for any 
1 < i < n, the sums X]fc>o converges to ai. Then ai G Q/or any 1 < i < n. 

Proof. Let t = (ri(e), . . . , (r„(e)))*. As M is stable, there exist afj G Q for any 
1 < ^1 i < and any x G 17 such that xr,/ = ^"=1 afj • rj. Let i? G Q"^" defined by 
^[^'i] — J2xei; '^i j- Let us prove by induction on k that for any integer k, we have 
(cjf , . . . , (T^)* = B'^t. The property is ti^ue for /c = as for any integer i, a^ = rj(e). 
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Now, 



E 



xri{w) 



«ieX'*,a;Gl7,je{l,...,n} 




= i3[z,j](i?'^t)[j] by induction hypothesis 

je{i,...,n} 

Therefore, B^t converges to (cri, . . . , cr„)*. From Lemma 1, cjj G Q for any 1 < 
i < n. □ 

Lemma 3. Let p G For any word u G res(p), X^^gi;* p(^^^i') G -f^- More- 

over, the set Res{p) generates the same subsemimodule ofK{{E)) as the set {up\u G 

Proof. Let p G S'^^{E). For any word u, Ylweu* Pi'^w) G M"^ since p is a stochastic 
language. Suppose now that K = Q or K = Q+. The set {up\u G U*} gener- 
ates a finite vector subspace V of Q{{S)). Let {uip, . . . , UnP} be a finite subset of 
{up\u G Z"*} which generates V. Let = X^^gj;* Uip{w) for any z = 1, . . . ,n. 
From Lemma 2, each a-i G Q. Now, for any u ^ H* , there exists ai, . . . , G Q such 
that iip = Ya^i aiiiip. Therefore, Y.^^^* P{uw) = Ya=i ^^i^i ^ 

So, for any K and any u G res{p), there exists an inversible element of K such 
that up = ttu - u~^p. In consequence, the set Res{p) generates the same subsemimod- 
ule of ^((1:)) as the set {np|n G T*}. □ 

For any stochastic language p over K, let us denote by [Res{p)] the subsemimod- 
ule of K{{U)) generated by Res{p) and let us call it the residual subsemimodule of p. 
Note that [Res{p)] is stable. 

Proposition 11. Letp G For any word u G res(p), G S^^{U). 

Proof. From Lemma 3, the residual stochastic languages u~^p belong to the same 
stable subsemimodules of K{{X!)) as p. Therefore, they ai^e rational over K. □ 

3.3 Characterization of <S^*(£') in terms of stable subsemimodules 

We show in this section that a series p over K is a rational stochastic language if and 
only if there exists a finite subset 5 in S{U) which generates a stable subsemimodule 
of K{{Z!)) and such that p G convK{S). 
The « if part » is easy to prove. 



18 



Proposition 12. Let p G K{{S)). Suppose that there exists a finite subset S in S{S) 
which generates a stable subsemimodule of K{{L)) and such that p G convK{S). 
Then p € S'j^^U). 

Proof. Let {pi, . . . ,pn} be a finite subset of S{S) which generates a stable subsemi- 
module of K{{I!)) and let p = Yli=i ^iPi where ai > for i = 1, . . . ,n and 
Sr=i ~ ^- Frorn Theorem 2, p is a rational series over K and p is a stochastic lan- 
guage since = Y17=i ^iPii'^) — for any word wand ^(17*) = Y17=i^iPii^*) — 
1. □ 

The converse proposition is easy to prove when K = Qor K = E..lt is slightly 
more complicated when K is not a field. 

Proposition 13. Let p S Then there exists a finite subset S in S{U) which 

generates a stable subsemimodule of K{{U)) and such that p S convK{S). 

Proof Letp G S'j^^S). 

When A' = Q or A' = M, is a commutative field, K{{U)) is a vector space 
and subsemimodules of K{{S)) are vector subspaces of K{{Z!)). From Lemma 3, the 
subspaces generated by {up\u G Z"*} and {u~^p\u G S*} coincide. From Theorem 2, 
{u~^p\u G U*} generates a stable finite vector subspace V of K{{I!)). Let S be 
a finite subset of {u~^p\u G res{p)} which contains p and generates V. Clearly, 
S C S{U) andp G convK{S). 

Let K = Q+ or K = M+. From Theorem 2, let i? = {n,..., r„} be a fi- 
nite subset of K{{S)) which generates a stable subsemimodule M containing p. We 
may suppose that ^ R as R and ii \ {0} generate the same subsemimodule. Let 
5 = {r G i?| J2w£E* ^(^) < oo}. First, let us show that S also generates a stable 
subsemimodule containing p. Let T = R \ S. Let s G 5 and let u G U*. As M 
is stable, we can write its = YlreR ^^r"^^ where the coefficients q" belong to K. As 
s G S, Yliw<^s* us{w) < CO. Therefore, r G T =^ a" = and S generates a stable 
subsemimodule. In a similar way, we can write p = Xlrefl f^rf and as p is a stochastic 
language, r G T =^ = and p belongs to the semimodule generated by S. 

Now, let S' = {(X^tugx'* •^(^)) ^ • sis G S}. Clearly, each element of S' is a 
stochastic language and an element of K{{S)) ( by using Lemma 2 when K = Q"*"). 
S' generates the same stable semimodule as S. We can write p = J2s£S' l^^s, where 
the coefficients /3s belong to K. As p and each element of S' is a stochastic language, 
we have X^sgS' f^s = f and hence, p G convK{S'). □ 

Putting together the previous propositions, we obtain the following theorem: 

Theorem 4. Let K G {M, Q,M"'",Q+}. A series p over K is a rational stochastic 
language if and only if there exists a finite subset S in S{E) which generates a stable 
subsemimodule of K{{E)) and such that p G convKiS). 

Proof. Apply Propositions 12 and 13. □ 
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3.4 Subclasses of rational languages defined in terms of properties of their set 
of residual languages 

Let p be. a. rational stochastic language over K. The set Res{p) composed of the 
stochastic residual languages of p is included in a stable finitely generated subsemi- 
module of K{{I!)) but it may happen that the residual subsemimodule [Res{p)] of p 
is not finitely generated. See Example 1 for instance. In the opposite, a stochastic lan- 
guage whose residual subsemimodule is finitely generated is rational. Therefore, two 
subclasses of can be naturally defined: 

- the set composed of rational stochastic languages over K whose resid- 
ual subsemimodule is finitely generated; 

- the set Sj^^ {E) composed of rational stochastic languages over K such that Res{p) 
is finite. 



Stochastic languages with finitely many residual languages. Every stochastic lan- 
guages with finitely many residual languages can be described by using positive pa- 
rameters only. In consequence, we obtain a Fatou-like property: every stochastic lan- 
guage with finitely many residual languages and which takes its values in Q is rational 
over Q+. Of course, for any K, there exist rational stochastic languages over K whose 
residual subsemimodule is finitely generated and which have not finitely many residual 
languages. 

Proposition 14. 1. S^'^iE) = S^+{S) 

2. sf^^s) = sl;:{s) = si'^iu) n Q+((i7)). 

3. For any K G {M,Q,M+,Q+}, C 

Proof. 1. It is sufficient to show that S^^{U) C S^]^{'S) in order to prove the first 

equality. Let p G S^^lU) and let Res{p) = {u^^p, . . . ,u~^p} be the set of 
residual languages of p. For any u ^ S* and any i G {1, . . . , n}, there exists j G 
{1, . . . ,n} such that iiu^^p = u~^p{uU*)u~^p. Since u^^p{uE*) > 0, Res{p) 

generates a stable subsemimodule of ]R+((Z')). Since p G Res{p), p G S^+{S) 
from Theorem 4. 

2. The proof of the first equality goes in a similar way, with the complementary ai^gu- 
ment that u~^p{uE*) G Q from Lemma 3. 

Now, let p G S^'^iH) n Q+((i:)). From Prop. 8, p G S^^\U). Therefore, p G 

3. Consider the probabilistic automaton defined on Fig. 5. It defines a stochastic lan- 
guage p over Q+. Let us show that p G S^+^'^^{U) \ S^+{'S) . 

First, let us show by induction on n that for any integer n, there exist a„ , /3„ G 
such that = anP + PnO-P- This is true when n = 0: take uq = 1 and /?o = 0. 
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a, 1/2 




a, 1/2 



Fig. 5. The automaton A generates a stochastic language over whose residual sub- 
semimodule is finitely generated but which has infinitely many residual languages. 



Suppose that the relation holds for the integer n. For any word u, we have: 



z'"+^p(n) = a'^p{au) 

= anp{au) + I3nap{au) by induction hypothesis 



a. 
~2 



^ap{u) + Pn + ^CLp{u)^ by remarking that p = pg^ 

and dp = pq-^ . 



So we can take = /?n/2 and fin+i = («n + f^n)/'^ which belong to 

from induction hypothesis. Therefore the module [Res{p)\ is finitely generated 

from Lemma 3: p G S^+^^'^{H) and therefore, p G for any K e 

{M,Q,M+,Q+}. 

Let 7„ = {a^)~^p{£). We have 

anP(e) + Pndp{e) a„ 

7n = ■ - 



Q!„ + /3n 2(a„ + /?„) ' 

Check that 7„ satisfies the following induction relation: 

l-27„ 



7n+l 



4(1 -7n)' 



The sequence (7^) converges to the irrational number (3 — \/5)/4 and therefore, 
7n = {a"-)~^p{e) takes an infinite number of values, which implies that p has 
infinitely many residual languages. □ 

Stochastic languages whose residual subsemimodule is finitely generated . When 
K is a field, every rational stochastic language is finitely generated. This property 
is no longer true when K G {M+,Q+}. In consequence, some stochastic languages 
whose residual subsemimodule is finitely generated cannot be generated by using only 
positive parameters. 

We prove also a Fatou-like property: every stochastic language over whose 
residual subsemimodule is finitely generated and which takes its values in Q is rational 
over Q~^. But we first need the following technical lemmas. 
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Lemma 4. Let k,n £ N and let ai, £ Qfor 1 < i < n and I < j < k. Consider 
the variables xi, . . . ,Xk and the system {S) composed of the n following inequations 

k 

for i = 1, . . . ,n. If (S) has a solution, then it has also a solution which satisfies 

k 

i=i 

fori = l,...,n. 

Proof. By induction on n. 

- Let 71 = 1. Let fii,. . . , be such that ai +X]j=i l^jl^i ^ 0- If o;i +5Z^=i l^jl^i = 

0, we are done. If ai + Yl^j=i l^jl^i > 0' there exists . . , fi'^. G Q such that 

ai + Yl^=i l^'jPi > since Q is dense in M and since ai + Yl!j=i I'-jPi ^ 
continuous expression of the /Xj. 

- Let n > 1 and let /ii, . . . , be such that Oj + Yl'j=i l^jf^i — ^'^^ 1 < « < f^- 
If ai + Yl^=i l^jl^i > fo'' ^'^y integer i, then there exists /x'^^ , . . . , /x'^ G Q such 
that tti + Ej=i > for any i, by using the same argument as previously. 
Otherwise, there exists at least an integer i such that Oi + J2'j=i fJ-j^j = 0. 

• If each = 0, then a-i is also null and this equation can be ruled out from the 
system without modifying its solutions. In this case, the induction hypothesis 
can be directly applied. 

• If there exists j such that /3j 7^ 0, then fij can be expressed as a function of 

the other /Xj: j^ij = —{ai + l^iPi)/f^i' can be replaced with —{ai + 

^i(^i)/Pi all the other inequations and the induction hypothesis can 
be applied. 

□ 

, r„ G Q((^)) and letai,... ,an G Q, . . . , G M+ Z^e 

n n 

ro = ^ ain = ^ /Sin. 

i=l 1=1 

) 7n £ Q"*" such that 

n 
i=l 

Proof. The set of parameters {(Ai, . . . , An) G M"| J27=i = 0} is a vector sub- 
space of M". Since the series ri, . . . , r„ take their values in Q, there exist k vectors 
{t\,..., tl), {4, . . . , t^) G Q", with k<n, such that for any (Ai, . . . , A„) G M", 

n k 

Ajrj = iff 3/ii, . . . , /ijfc G M s.t. Aj = ^ /x^t^ for any i = 1, . . . ,n. 

i=i j=i 



Lemma 5. Let ro, ri, , 
such that 



Then, there exists 71 , 
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Hence, for any (Ai, . . . , A„) G M", 

n k 

ro = ^ Ajrj iff 3^i, . . . , fik £ K s.t. Aj = aj + ^ /x^t^ for any i = 1, . . . , n. 

i=i j=i 

In particular, there exist fii, . . . , fi^ such that /3j = + /^jt^ > for any i = 

1, . . . ,n. 

Consider the system composed of the n inequations Ui + X^j=i Xjtj > for 
i = 1, . . . , n. It has a solution and from the previous Lemma, it has also a solution 
(/ii, . . . , /ifc) which satisfies ai + J2^=i f^j^i ^ for i = 1, . . . , n. □ 

Proposition 15. 1. When K G {M, Q}, = 

Proof. 1. When A' G {M, Q}, K is a commutative field. As a consequence, any vec- 
tor subspace of a finitely generated vector subspace ofK{{H)) is finitely generated 
itself. Therefore, for any p G the residual subsemimodule of p is finitely 

generated. 

2. Example 1 describes a rational stochastic language whose residual subsemimodule 
is not finitely generated. 

3. Let p G S^+^^'\U) n Q+((r)). Let 5 = {n, . . . ,r„} C i?es(p) be a finite 
subset which generates the same subsemimodule as Res{p) in From 
Prop. 8, p G and from Prop. 11, each G 5Q^*(i7). S also generates 
the same subsemimodule as Res{p) in Q{{I!)). From Lemma 5, for any word u 
and any index i, there exists 7^'", . . . , 7*1" G Q"*" such that ur^ = 
Therefore, S generates a stable subsemimodule of Q"'"((Z')). Also from Lemma 5, 
there exists 71, . . . , 7^ G such that p = J21^=i "fi'^'i - Therefore, p G convQ+ (S) 
andpG5^r'"(i;). 

□ 

Remark that C 5^'"^^"(r)nQ+((i:)) since 5^™^"" (r) C 5™*(i7) = 

5£»*(i7) n Q+((i:)) = si'^'^'^iu) n Q+((i:)) 

Finaly, we show that when K is positive, finitely generated stochastic languages 
over K have a unique normal representation in terms of stable subbsemimodules gen- 
erated by residual languages which is minimal with respect to inclusion. 

Proposition 16. Let K = <Q+ or K = R+ and let p G cS^*"^''''(Z'). Then, there 
exists a unique finite subset R C Res{p) which generates a stable subsemimodule of 
K{{E)), such that p G convxiR) and which is minimal for inclusion. 

Proof Let K = ov K = M+ and let p G cS^*"^''"(i7). Let i? = {ri, . . . , r„} 

and S = {si, . . . , Sm} be two minimal subsets of Res{p) generating [Res{p)]. Let 

rj(, G R. We are to prove that tq G S. 

There exist aj^ , . . . , Oj" G A such that rj^ = YjT=i ^Aa^i- 

There exist ^ K for any 1 < i,j < n such that Sj = Yl^=iPi''"j 

1 < i < m. 
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Therefore, 

m n n / m \ 

i=i j=i j=i \i=i / 

If X^i^i '^io'^i" ^ I' '^'^^'^ could express r.jo as a convex combination of the 
other ri and R would not be minimal for inclusion. Therefore, YliLi 'A^^T ~ ^■ 

Since YllLi ^lo ~ ^^'^ ^^^^ f^l ^ t^' f^"" ^'^y i^idex i such that a*^^ / 0, 
we must have = 1. Therefore, for any index i such that ^ 0, we must have 
Si = ''io ■ As such an index must exist, r j^, G 5. 

Since no condition has been put on r^p , then RQ S and finally, R = S. □ 



Fig. 6. Inclusion relations between classes of classes of rational stochastic languages, 
including sj^^^'^^iU) and Sj^''{E). 



4 Multiplicity automata and rational stochastic languages. 

In the previous Sections, we have defined several classes of rational stochastic lan- 
guages over K e {M, Q, M"*^, Q'^}. In this section, we study the representation of these 
classes by means of multiplicity automata: given a subclass C of rational stochastic 
languages over K, is there a subset of i^-multiplicity automata both simple to identify 
and sufficient to generate the elements of C? The first result we prove is negative: it is 
undecidable whether a given multiplicity automaton over Q generates a stochastic lan- 
guage. Moreover, there exist no recursively enumerable subset of multiplicity automata 
over Q sufficient to generate Sq'^{U). This result implies that no classes of multiplic- 
ity automata can efficiently represent the class of rational stochastic languages over 
Q or M. In the other hand, we show that the class of K -probabilistic automata rep- 
resents when K G {M"^, Q^"}. Clearly, it can be decided efficiently whether 
a given multiplicity automaton is a probabilistic automaton. We show also that the 
class of K -probabilistic residual automata represents the class sj^"'^'^^{X!) for any 
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K G {M, M+, Q, Q^}. We do not know whether the class of probabilistic residual au- 
tomata is decidable. However, we show that it contains a subclass which is decidable 
and sufficient to generate Sj^"'^'^^{E). Nevertheless, we show that deciding whether a 
given MA is in this subclass is a PSPACE-complete. Finally, the class of probabilistic 
deterministic automata over (resp. Q"*"), which is clearly decidable, represents the 
class S{^'^{S) when K G {R,M+} (resp. K G {Q,Q+}). 

To our knowledge, the decidability of the following problems is still open: 

- decide whether a given multiplicity automaton is equivalent to a probabilistic au- 
tomaton, or a probabilistic residual automaton or a probabilistic deterministic au- 
tomaton; 

- decide whether a given probabilistic automaton is equivalent to a probabilistic 
residual automaton or a probabilistic deterministic automaton; 

- decide whether a given probabilistic residual automaton is equivalent to a proba- 
bilistic deterministic automaton. 



4.1 The class of MA which generate stochastic languages is undecidable 

A MA A generates a stochastic language pA if and only if 

- V^« G I:*,pa{w) > and, 

- J2weu-PAiw) = 1. 

We first show that the second condition can be checked within polynomial time. 
We need the following result: 

Lemma 6. [Gan66,BT00] Let M be a square matrix with coefficients in Q. It is de- 
cidable within polynomial time whether converges to when k tends to infinity. 

Proof. (Sketch) First, A/'^ converges to when k tends to infinity if and only if the 
spectral radius p{M) of M, i.e. the maximum of the magnitudes of its eigenvalues, 
satisfies p{M) < 1. 

Then, M satisfies p{M) < 1 iff the Lyapunov equation 

MPM* = P 

has a positive-definite solution. In that case the solution is unique. Since the Lyapunov 
equation is linear in the unknown entries of P, we can compute a a solution P in 
polynomial time, or decide it does not exist. To check that P is positive definite, it is 
sufficient to compute the determinants of the principal minors of P and check that they 
are all positive. □ 

Proposition 17. Let A be an MA over Q. It is decidable within polynomial time whether 
the sum Pa{S^) converges. If the sum Pa{S*) = Ylk Pa{^^) converges, it can 
be computed within polynomial time. 

Proof. Let A = (Z", Q, Lp, l, t) where Q = {qi, . . . , q„} and let M be the square ma- 
trix defined by M[i,j] = [Lp{qi, gj)]i<jj<„- We have Pa{S^) = i^aM^ta where 
i^A = (I'iqi), I'iqn)) and TA = {T{qi), T{qn)f. 
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Let E be the subspace of M" spanned by {M^TA\k G N} and let F be a comple- 
mentary subspace of E in R". Let H = {u e E\ik £ N, laM'^u = 0}. Clearly, E 
and H are stable under M. Let G be a complementary subspace of H in E. For any 
u E M", there exists a unique decomposition of the form u = up + uq + uh where 
Up £ F,ug £ G and uh G H. Let p^?, pj;/ and pc be the projections on F, G and 
defined by pf{u) = up, pg{u) = Ug and ph{u) = uh. Let Pp, and Pg be the 
corresponding matrices. 

First note that for any integer k > 1 and any u £ E, •we, have PgM^ Pqu = 
{PgMPg)^u. This is clear when k = 1. We have 

PgM'^+'Pgu = PgM\MPgu) 

= PgM^[PhMPgu + PgMPgu] since MPgu G £; 

= PgW'PgIPgMPgu] since Vt; G H, Mv G and Pg{v) = 

= (Pg'MPg')'^''"^^ from induction hypothesis. 

Note also that for any integer k and any u £ E, 

laM^u = iAM^{PGU + Ph'u) since u£ E 

= laM^Pgu since \/v G H, Mv G H and lav = 
= la^PgM^Pgu + PhM^Pgu) since M^'Pcii G E 
= laPgM^'Pgu since G H, lav = 
= iA{PGMPG)'u. 

We show now that J2k£N '^aM^ta is convergent iff lim.k_^^{PGM Pg)^ = 0. 

- Suppose that lim.j.^^{PGM Pg)^ = 0. Then Id — PgMPg is inversible and 
T^kGNiPcMPG)^ converges to {Id - PgMPgY^ Therefore, EfceN^AM^A 
converges to iA{Id — PgMPg)~^ta- 

- Suppose now that J2ken ^aM'^ta is convergent. 

There exists A > such that for all u £ G, there exists n G N such that | laM^u] > 
A| |ii| I . Otherwise, there would exist a sequence Uf^ of elements of G such that for all 
integern, \iAM"-{uk)\ < ||tifc||/A;. Let = Ufc/||ttfc|| andletVo-^fe) a subsequence 
which converges to v. Check that we should have | | = 1, v G G and laM^v = 
for any integer n, which is impossible since v 0. 

Let A satisfying this property. For any integers m and k, there exists such that 

\LAM^'^iPGM''PG)iM^TA)\ > X\\iPGM''PG){M"'TA)\\ = MKPgM Pg)HM"'ta)\\. 
We have also 

iAM^^iPGM'^PGm^TA) = iA{PGMPGTKPGM^PG){M'^rA) 

= iA{PGMPGT^^\M^TA) 

= laM''^^\M'^ta) 

= tAM''^+''+'^TA. 
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If we suppose that Lj^M'^Tj^ when k — > oo, we must have | {PqM^Pq){M^ta) 
when A: ^ oo for any integer m. As {M'^ta} generates E, PgM^Pg converges 
to 0. 

To sum up, ^^Zk^^i^^) is bounded iff {PcMPq)^ converges to 0, which is a 
polynomially decidable problem (Lemma 6). 

When the sum Pa{^^) converges, it is equal to LA{Id— PgMPg)~^ta which 
can be computed within polynomial time. □ 

Example 2. Consider the MA A" described on Fig. 3. We have 

LA" = (1, 0), = (1/4, 1/4)* and M = I 5 

We have M ta" = 3 /Ata" and therefore, E is the vector space spanned by ta" ■ Let F 
be the complementary space of E spanned by the vector (1, —1)*; we have 

H = {0},G = E,Pg = \ (I I) , and 1 - PgMPg = \ {^^^ ~^ 

Check that the inverse of 1 — PgMPg is equal to 

1/53 
2 V35 

and that LA{Id - PgMPg)~^ta = 1. 

We prove now that it is undecidable whether a multiplicity over Q generates a 
stochastic language. In order to prove this result, we use a reduction to a decision 
problem about acceptor PAs. 

An MA {S, Q, ip, l, t) is an acceptor PA if 

- ^p, L and T are non negative functions, 

- \/q£Q,yx £ U,Y,^^Qip{q,x,r) = 1 

- there exists a unique terminal state t and r(t) = 1. 

Blondel and Canterini have shown that given an acceptor PA A over Q and A G Q, 
it is undecidable whether there exists a word w such that PaIw) < A ([BC03]). 

Theorem 5. It is undecidable whether an MA over Q generates a stochastic language. 

Proof. For any rational series r over U, let us denote by r the rational series defined 
by 

r{w) 



Let A = {U,Q,ip, l,t) be an acceptor PA over Q and let A G Q. Let B = 
{UjQjIPBjLjTb) be the MA defined by ipB{Q-,x,q') = ip{q,x,q')/{\Z!\ + 1) and 
'''siq) = t{q)/{\^\ + 1) for any states q,q' £ Q and any x G Z". Remark that B is 
semi PA and that tb = ta- 

The sum s = J2wes* "^siw) is bounded by 1 from Prop. 2 and can be computed 
within polynomial time by using the Prop. 17. Let c\ be the series defined by c\{w) = 
A for any word w £ U*. 
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- If s < A, then there must exists a word w such that Pa{w) < A since 



E 



A 



A. 



;^ + 1)1-1+1 



- If s = A, the rational series 1 + — ca is a stochastic language iff rA{w) > A for 
any word w. 

- If s > A, the rational series • r^i — ca is a stochastic language iff rA{w) > A 
for any word w. 

Since in the two last cases, a multiplicity automaton which generates l+r^ — ca (resp. 

■ rA — cx) can easily be derived from A, an algorithm able to decide whether an 
MA generates a stochastic language could be used to solve the decision problem on 
PA acceptors. □ 

A reduction to the following undecidable problem could have also been used: it is 
undecidable whether a rational series over Z takes a negative value [SS78]. 

The set of multiplicity automata over Q which generate stochastic languages is not 
only not recursive: it contains no recursively enumerable set able to generate Sq^{U). 

Theorem 6. No recursively enumerable set of multiplicity automata over Q exactly 
generates Sq'^{U). 

Proof. From Prop. 17, the set A composed of the multiplicity automata A over Q 
which satisfy Pa{^*) = 1 is recursively enumerable. 

The subset B composed of the elements of A which satisfy 



is recursively enumerable. 

Suppose that there exists a recursive enumeration Rq, Rn, ... of multiplicity 
automata over Q sufficient to generate Sq^{S) and let t^o, • • • , i^n, • • • be an enumer- 
ation of U*. 

Consider the following algorithm: 

Input: a multiplicity automaton A over Q 
If pa{S*) = 1 then 
For i > do 

If pA{wi) < then output NO; exit; Endlf 
If A is equivalent to Ri then output YES; exit; Endlf 
EndFor 



Since the equality X]ioei7* ^a{w) = 1 and the equivalence of two multiplicity 
automata can be decided, this algorithm would end on any input and decide whether 
A generates a stochastic language. Therefore, the enumeration Rq, . . . , Rn, . . . cannot 



3w£ S*Pa{w) < 



Else 



output NO; exit 



Endlf 



exist. 



□ 
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4.2 Probabilistic automata 



So, Sq^{I]) and S^^{S) cannot be identified by any efficient subclass of multiplicity 
automata. In the other hand, and S^{U) can be described by probabilistic 

automata which form an easily identifiable subclass of multiplicity automata. 

Proposition 18. Let K G {M+,Q+} and let p G K{{E)). Then, p is a stochastic 
language over K iff there exists a K -probabilistic automaton A such that p = ta- 

Proof. The only thing to prove is that if p G 5}^* (Z") then there exists a ii'-probabilistic 
automaton A such that p = rA- 

From Theorem 4, there exist a finite subset S of S^^{S) which generates a stable 
subsemimodule of K{{E)) and such that p G convK{S). Suppose that S is minimal 
for inclusion. For any s,s' G S and any x ^ E, let and a^^/ G K such that 

P = Es&s o^ss and xs = J^s'eS 

Let A = {S, S, If, i, t) be the MA defined by: 

- i{s) = as, 

- t{s) = s{e), 

- ^{s,x,s') = al^, 

for any s, s' G 5" and any x G Z". From Claims 1 and 2,p = rA- 

Since 5 C every state of A is co-accessible and since 5 is minimal, every 

state of A is accessible. Therefore, A is trimmed. 

Note that EsGS>^is) = E s^s — 1 since elements of {p} U S* are stochastic 
languages. For any s G S", 



t{s)+ Y1 V>{s,x,s') = s{e) + 



= s{e) + *^(^*) 
xeE 

= s{e) + Y K^^*) 
xeE 

= 1. 

Then, A is a PA. □ 



4.3 Probabilistic residual automata 

For any K G {IR+, Q+}, the class sj^'^^^'^iS) can be described by probabilistic resid- 
ual automata. 

Proposition 19. Let K G {M+jQ"*"} and let p G K{{E)). Then, p is a stochastic 
language over K whose residual subsemimodule is finitely generated iff there exists a 
K -probabilistic residual automaton A such that p = rA- 

Proof- - Let p G S{^"'^'^^{U) and let wi, . . . ,Wn G res{p) be such that S = 
{tfj^^p, . . . ,w~^p} generates [Res{py\- Let A be the MA associated with 5 as 
in the proof of Prop. 18. Check that A is a PRA which generates p- 
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- Let A (Z", Q, if, i, t) be a PRA which generates p and for any q ^ Q, let Wq G H* 
be such that ^ = w~^p. From Claim 3, {w~^p\q € Q} generates a stable 
subsemimodule M which contains p. Check that [Resij))] = M. 

□ 

Remark that from Prop. 16, there exists a unique minimal subset S of Res{p) 
which generates [Res{p)]. A PRA based on this set has a minimal number of states. 

We do not know whether the class of PRA is decidable. However, we show that 
the class of M.~^ -reduced PRA is decidable. Since a reduced PRA is a PRA, any PRA is 
equivalent to a reduced PRA and therefore, this class is sufficient to generate Sj^"'^^^{I!) 

Let ^ be a PA and let {Z!,Q,6,Qj,Qt) be the support of A. If for any state 
q £ Q, there exists a word Wq such that 6{Qj, Wq) = {q}, then ^ is a PRA since 
Wq^TA = rA,q- The converse is true when A is reduced. 

Proposition 20. Let A be a -reduced PA and let {S, Q, 6, Qj, Qt) be the support 
of A. Then, A is a PRA if and only if for any state q £ Q, there exists a word w such 
that 6{Qj,w) = {q}. 

Proof. Suppose that A is a PRA. Let q £ Q and wbea. word such that w'^rA = fA^q- 
Let Qyj = 6{Qi,w). There exist (ag/)g/gQ^ such that w~^rA = Y.q'eQ^ aq'rA,q'- 
Since q e Q^, (1 - aq)rA,q = J2q'(^Q^,q'^q'^q'^A,q'- Sincc A is M+-reduced, we 
must have = 1 and therefore, Q^j = {q}- Q 

Corollary 1. It can be decided whether a -reduced MA is a PRA. 

Proof. It can easily be decided whether an MA is a PA. Then, the power set construc- 
tion can be used to check whether any state can be uniquely reached by some word. 

□ 

From Prop. 6, it can efficiently be decided whether an MA is M+-reduced PA. 
But unfortunately, no efficient decision procedure exist to decide whether it is an M+- 
reduced PRA: the decision problem is PSPACE-complete. 

Proposition 21. Deciding whether a -reduced PA is a PRA is PSPACE-complete. 

Proof. We prove the proposition by reduction of the following PSPACE-complete 
problem: given n DFA A^ , . . . ,A^ over U, let Lj be the language recognized by A^ 
for 1 < i < n, deciding whether U"^^Lj = U* is PSPACE-complete. 

Let A^ = {S, Q\ {qi}, Qip, (5*) for 1 < i < n where i / j implies that Q' n = 
0. We may suppose that Lj 7^ for 1 < i < n. Consider 3 new states q^, qi, qj, n -\- I 
new letters xi, . . . , Xn, A. Let A = {Z!a, Qa, Qi, Qt, S) be an NFA defined by: 

- Sa = U U {Xi, . . . ,Xn,X} 

- QA = u^=iQ*u{(7o,gi,g/}, 

- Qi = {gc^o,--- ,9o}' 

- Qt = {qi,qf}, 

- for any I < i,j < n, any q £ and any x £ U, 

• Hm,x) = 6'{q,x), 

• 6{q, Xj) = {qlf} if i = j and otherwise, 

• 6{q, A) = {qf} if q £ Qlp and otherwise. 
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- for any x e U, 6{qo,x) = {qq}, 6{qi,x) = and 6{qf,x) = 0, 

- S{qo, A) = {qi}, 5{qi, A) = {go} and 5{qf, A) = U^^,{ql q^}. 

Check that for any q S Uf^iQ^ Ulqj}, there exists a word Wq such that 5{Qj, w) = 
{q}. If there exists a word wq such that 5{Qj,wo) = {go} then S{Qi,W{)X) = {qi}. 

Now, suppose that U^^^Li / U* and let u e U* \ ^'i^iU. Then 6{Qi,u) D 
^^=iQt = and therefore, 5{Qi,u\) = {gi} and 6{Qi,uXX) = {go}- 

IfUf^iLi = i;*,foranynG r*,5(Q7,n)nUf=iQ^y^0,5(Q/,nA) ={gi,g/},5(Q/,uAi7) = 
and d{Qi,uXX) = Qj. Therefore, there exists no word wq such that 6{Qj,wo) = 
{qo}. 

That is, U"^^Lj 7^ U* if and only if for any g S Qa> there exists a word Wq € S\ 
such that Wq) = {q}. 

Now, associate a new letter to each state g G Q^i and consider the MA B = 

{UB,QB,i,T, ip) where 

- Sb = ^A^ {yq\q G Qa}, 

- Qb = Qa^ {qb}, 

- i{q) = l/(n + 1) if g G Q/ and otherwise, 

- T(g) = 1 if g = gb and otherwise, 

- ^{q,x,q') = l/(Eyei7 + 9'?' ^Qa,x& and g' G 5(g,x), 

- 'piq,yq,qb) = V(Eyei: \^iq^y)\ + 1)' 

- (^(g, X, q') = in all other cases. 

Check that is a PA. B is M^-reduced since for any g G Qa, fB^qiUq') 7^ iff g = g' 
and rB,q{£) = 0. i? is a PRA if and only if for any g G Qa, there exists a word 
Wq G Z!a ^tich that 6{Qi,Wq) = {q}. 

Putting all together, we see that an algorithm which decides whether i? is a PRA 
could be used to decide whether Uf^iLi / E*. 

As the problem is clearly PSPACE, it is PSPACE-complete. □ 

It has been shown in [DLT02] that for any polynomial p(-), there exists an NFA 
A = {Sa, Q, Qi, Qt, 5) which satisfies the following properties: 

- for any state g of A, there exists a word w S* such that d{Qi,w) = {g}, 

- for any state g of A, all words w which satisfy S{Qi,w) = {g} have a length 
greater than 

These NFA are support of PRA which inherit of this property. 

So, reduced PRA form a decidable family which is sufficient to generate 5^™^*^" (17) 
but the membership problem for this family is not polynomial. We can restrict this fam- 
ily to obtain a polynomially decidable family and still sufficient to generate 5^'"^^" (17). 

Let A = {U, Q, i, T, If) be a PRA. A is prefixial if for any q £ Q, there exists 
Wq G S* such that w'^va = rA,q and such that {wq\q G Q} is prefixial. 

It is polynomially decidable whether an MA is a prefixial PRA. 

Let A = {S, Q, i, r, be a PRA, and for any q £ Q, let Wq G U* such that 
Wq^rA = 'fA,q- Let W = {wq\q G Q} and let W be the smallest prefixial subset of 
U* which contains W. Let B = {E, W, T, r, Tp) be the MA defined by: 

- i(g) = 1 if g = e and otherwise, 

- t{w) = w~^rA{£), 
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- y:>{w, X, wx) = w ^rA{xU*) for any x G Z", 

- lp{wg, X, Wqi) = ip{q, X, q') if WqX W, 

- Tp{w, X, w') = in all other cases. 

It can be shown that S is a prefixial PRA equivalent to A. 
4.4 Probabilistic Deterministic Automata 

For any K G {M,Q,M+,Q+}, the class Sj^^iU) can be described by probabihstic 
deterministic automata. 

Proposition 22. Let K £ {R, Q,M+, Q+} and let p £ K{{U)). Then, p is a stochas- 
tic language over K which has finitely many residual languages iff there exists a K- 
probabilistic deterministic automaton A such that p = ta- 

Proof. From Prop 14, we can suppose that K G Q"^} . 

- Let p G S^jP'{E) and let Res{p) = {u)j~^p, . . . , w~^p}. Let A be the MA asso- 
ciated with S as in the proof of Prop 18. As there exists i G {1, . . . , n} such that 
p = w~^p, we can suppose that = 1 if s = w^^p and otherwise. Let sw^^p. 
If X ^ res{s), then J2weE* p{wixw) = and since K G Q'^}, this implies 
that p{wixw) = for any word w. Therefore, in this case, it is possible to choose 
'^s s' ~ ^ ^'^^ ^' ^ Res{p). When x G res(s), there exists j G {1, . . . ,n} 
such that x^^s = wj^p. In this case, we can choose ^, = 1 if s' = wj^p and 
otherwise. 

Then, check that ^ is a PDA which generates p. 

- Let A = {U, Q, ip, t, r) be a PDA which generates p and let Qj = {qo}- For any 
w G U*, there eixts only one state q £ Q such that ip{qQ,'w,q) / 0. Therefore, 
Res{p) C {rA,q\q G Q} and Res{p) is a finite state. 

□ 



(^) 



3.f+ = 5^;"-''"(S) nQ+((i:» 



Fig. 8. Inclusion relations between classes of classes of rational stochastic languages. 
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5 Conclusion 



In this paper, we have carried out a systematic study of rational stochastic languages, 
which are precisely the objects probabilistic grammatical inference deal with. This 
study, and the results we bring out, whether they are original or derived from former 
contributions, support our opinion that researches in grammatical inference should 
be based and rely on formal language theory. Doing this makes it possible to reuse 
powerful tools and general results for inference purposes. Moreover, this approach may 
help finding out what particular properties are important for grammatical inference. For 
example, a learning sample {wi, . . . , Wn} independently drawn according to a target 
stochastic language p provides statistical information on the residual languages of p. 
In order to infer an approximation of p by means of a multiplicity automata A, there 
should be a structural link between the states of A and the observed data and hence, 
between the states of A and the residual languages of p. This explains why most results 
in grammatical inference deal with PDA and PRA, i.e. classes of multiplicity automata 
for which there exists a strong connection between the states and the residual languages 
of the stochastic languages they generate. This also explains why there is no useful 
general inference result about PA: the residual subsemimodule of a rational stochastic 
language over or Q+ may be not finitely generated and hence, no finite set of 
residual languages can be used to represent it. Moreover, PA admits no natural normal 
form. On the other hand, the residual subsemimodule of rational stochastic languages 
over M or Q are finitely generated and admit a basis made of residual languages. Even 
if there exists no recursively enumerable subset of MA capable of generating them, 
this study has encouraged us to try to find a way to infer these most general stochastic 
languages. See [DEH06] for preliminary results. We are also currently working on tree 
rational stochastic languages, following a similar^ approach, in order to deal with tree 
probabilistic languages inference. This work is still in progress. 
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